{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Question-Answering Task \n",
    "\n",
    "In the question answering task, we will be given a question along with a paragraph containing an answer to the question. Our goal is to extract the answer from the paragraph for a given question. Now, let's learn how to finetune the pre-trained BERT for performing the question-answering task. \n",
    "\n",
    "The input to the BERT model will be the question-paragraph pair. That is, we feed a question and also a paragraph containing the answer to the question to the BERT and it has to extract the relevant answer from the paragraph according to the given question. So, essentially the BERT has to return the text span containing an answer from the paragraph. Let's understand this with an example, consider the following question-paragraph pair:\n",
    "\n",
    "Question = \"What is the immune system?\"\n",
    "\n",
    "Paragraph = \"The immune system is a system of many biological structures and processes within an organism that protects against disease. To function properly, an immune system must detect a wide variety of agents, known as pathogens, from viruses to parasitic worms, and distinguish them from the organism's own healthy tissue.\"\n",
    "\n",
    "Now, our model has to extract an answer from the paragraph, so it essentially has to return the text span containing the answer. So it will return the following:\n",
    "\n",
    "Answer = \"a system of many biological structures and processes within an organism that protects against disease\"\n",
    "\n",
    "Okay, now how can we finetune the BEERT model to do this task? To do this, our model has to understand the starting and ending index of the text span containing the answer in the given paragraph. For example, in the question, 'What is the immune system?'. If our model understands that the answer to this question starts from index 4 (\"a\") and ends at index 21 (\"disease\") then we can get the answer as shown in the following:\n",
    "\n",
    "Paragraph = \"The immune system is a system of many system of many biological structures and processes within an organism that protects against disease\" biological structures and processes within an organism that protects against disease. To function properly, an immune system must detect a wide variety of agents, known as pathogens, from viruses to parasitic worms, and distinguish them from the organism's own healthy tissue.\"\n",
    "\n",
    "Now, how to find this starting and ending index of the text span containing the answer? If we get the probability of each token (word) in the paragraph to be the starting and ending token (word) of the answer, then we can easily extract the answer, right? Yes, but how we can achieve this? To do this, we use two vectors called the start  vector $S$ and end vector $E$.  The values of start and end vector will be learned during training. \n",
    "\n",
    "\n",
    "\n",
    "\n",
    "First, we compute the probability of each token (word) in the paragraph being the starting token of the answer. To compute this probability, for each token $i$, we compute the dot product between the representation of the token $R_i$ and the start vector $S$. Next, we apply the softmax function to the dot product $S.R_i$ and obtain the probability:  \n",
    "\n",
    "$$P_i = \\frac{e^{S.R_i}}{\\sum_j e^{S.R_j}} $$\n",
    "\n",
    "Next, we compute the starting index by selecting the index of the token which has a high probability of being the starting token. \n",
    "\n",
    "In a very similar fashion, we compute the probability of each token (word) in the paragraph being the ending token of the answer. To compute this probability, for each token $i$, we compute the dot product between the representation of the token $R_i$and the end vector $E$. Next, we apply the softmax function to the dot product $E.R_i$ and obtain the probability:  \n",
    "\n",
    "\n",
    "$$P_i = \\frac{e^{E.R_i}}{\\sum_j e^{E.R_j}} $$\n",
    "\n",
    "Next, we compute the ending index by selecting the index of the token which has a high probability of being the ending token. Now, we can select the text span containing answering using the starting and ending index.  \n",
    "\n",
    "As shown in the following figure, first, we tokenize the question-paragraph pair and feed the tokens to the pre-trained BERT which returns the embeddings of all the tokens. As shown below, $R_1$ to $R_N$ denotes the embeddings of the tokens in the question and  $R'_1$ to $R'_M$  denotes the embedding of the tokens in the paragraph.\n",
    "\n",
    "After computing the embedding, we compute the dot product with the start/end vectors, apply the softmax function, and obtain the probabilities of each token in the paragraph being the start/end word as shown below:\n",
    "\n",
    "\n",
    "\n",
    "![title](images/9.png)\n",
    "\n",
    "From the above figure, we can observe how we compute the probability of each token in the paragraph being the start/end word. Next, we can select the text span containing answering using the starting and ending index which has a high probability. To get a better understanding of how this works, let's see how to use the finetuned Q&A BERT in the next section. \n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
